Identification of the Isotherm Function in Chromatography 

Using CMA-ES 



M. Jebalia^, A. Auger^, M. Schoenauer^, F. James^, M.Postel^ 



Abstract — This paper deals with the identification of the 
flux for a system of conservation laws in the specific example 
of analytic chromatography. The fundamental equations of 
chromatographic process are highly non linear. The state-of-the- 
art Evolution Strategy, CMA-ES (the Covariance Matrix Adap- 
tation Evolution Strategy), is used to identify the parameters 
of the so-called isotherm function. The approach was validated 
on different configurations of simulated data using either one, 
two or three components mixtures. CMA-ES is then applied 
to real data cases and its results are compared to those of a 
gradient-based strategy. 

I. Introduction 

The chromatography process is a powerful tool to separate 
or analyze mixtures ||6l. It is widely used in chemical industry 
(pharmaceutical, perfume and oil industry, etc) to produce 
relatively high quantities of very pure components. This is 
achieved by taking advantage of the selective absorption of 
the different components in a solid porous medium. The 
moving fluid mixture is percolated through the motionless 
medium in a column. The various components of the mixture 
propagate in the column at different speeds, because of 
their different affinities with the solid medium. The art of 
chromatography separation requires predicting the different 
proportions of every component of the mixture at the end of 
the column (called the chwmatogram) during the experiment. 
In the ideal (linear) case, every component has its own 
fixed propagation speed, that does not depend on the other 
components. In this case, if the column is sufficiently long, 
pure components come out at the end of the column at 
different times: they are perfectly separated. But in the real 
world, the speed of a component heavily depends on every 
other component in the mixture. Hence, the fundamental 
Partial Differential Equations of the chromatographic pro- 
cess, derived from the mass balance, are highly non linear 
The process is governed by a nonlinear function of the 
mixture concentrations, the so-called Isotherm Function. This 
function computes the amount of absorbed quantity of each 
component w.rt. all other components. 

Mathematically speaking, thermodynamical properties of 
the isotherm ensure that the resulting system of PDEs 
is hyperbolic, and standard numerical tools for hyperbolic 
systems can hence be applied; if the isotherm is known: 
The precise knowledge of the isotherm is crucial, both from 
the theoretical viewpoint of physico-chemical modeling and 
regarding the more practical preoccupation of accurately 
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controlling the experiment to improve separation. Specific 
chromatographic techniques can be used to directly identify 
the isotherm, but gathering a few points requires several 
months of careful experiments. Another possible approach 
to isotherm identification consists in solving the inverse 
problem numerically: find the isotherm such that numerical 
simulations result in chromatograms that are as close as 
possible to the actual experimental outputs. 

This paper introduces an evolutionary method to tackle 
the identification of the isotherm function from experimental 
chromatograms. The goal of the identification is to minimize 
the difference between the actual experimental chromatogram 
and the chromatogram that results from the numerical sim- 
ulation of the chromatographic process. Chemical scientists 
have introduced several parametric models for isotherm func- 
tions (see |6| for all details of the most important models). 
The resulting optimization problem hence amounts to para- 
metric optimization, that is addressed here using the state-of- 
the-art Evolution Strategy, CMA-ES. Section HI] introduces 
the direct problem and Section [III] the optimization (or 
inverse) problem. Section HV-AI reviews previous approaches 
to the problem based on gradient optimization algorithms 
[fT3l, JTS]. Section HV^ details the CMA-ES method and 
the implementation used here. Finally, Section |V] presents 
experimental results: first, simulated data are used to validate 
the proposed approach; second, real data are used to compare 
the evolutionary approach with a gradient-based method. 

II. Physical problem and model 

Chromatography aims at separating the components of 
a mixture based on the selective absorption of chemical 
species by a solid porous medium. The fluid mixture moves 
down through a column of length L, considered here to be 
one-dimensional. The various components of the mixture 
propagate in the column at different speeds, because of 
their different behavior when interacting with the porous 
medium. At a given time t G K+, for a given z G [0, L] the 
concentration of m species is a real vector of R™ denoted 
c(t, z). The evolution of c is governed by the following 
partial differential equation: 

d,c + atF(c) = 0, 

c(0,z) =Co(z), (1) 

C(t, 0) = Cinj{t). 

where Cq : K — > R™ is the initial concentration, Cinj : M 
R™ the injected concentration at the entrance of the column 
and F : M™ M'" is the flux function that can be expressed 



in the following way 



-H(c) 



where H : M™ M™ is the so-called isotherm function, 
e S (0, 1) and u G lfT2l . The Jacobian matrix of F being 
diagonalizable with strictly positive eigenvalues, the system 
([TJ is strictly hyperbolic and thus admits an unique solution 
as soon as F is continuously differentiable, and the initial and 
injection conditions are piecewise continuous. The solution 
of Eq.[T] can be approximated using any finite difference 
method that is suitable for hyperbolic systems |5 |. A uniform 
grid in space and time of size (A'+l) x (A^+1) is defined: Let 
Az (resp. At) such that KAz = L (resp. NAt = T). Then 
an approximation of the solution of Eq.[T]can be computed 
with the Godunov scheme: 

Cfe+i=cI!-^(F«)-F(cri)) (2) 

where is an approximation of the mean value of the 
solution c at point {kAz,nAtj^ For a fixed value of 
the solution of Eq.|2]converges to the solution of Eq.[T]as At 
and Az converge to zero. The numerical scheme given in Eq. 
|2] is numerically stable under the so-called CFL condition 
stating that the largest absolute value of the eigenvalues of 
the Jacobian matrix of F is upper-bounded by a constant 



Az 

— maxSp(|F'(c)|) < CFL < L 
At c 



(3) 



III. The Optimization Problem 

A. Goal 

The goal is to identify the isotherm function from exper- 
imental chromatograms: given initial data cq, injection data 
Cinj, and the corresponding experimental chromatogram Cexp 
(that can be either the result of a simulation using a known 
isotherm function, or the result of actual experiments by 
chemical scientists), find the isotherm function H such that 
the numerical solution of Eq. [T] using the same initial and 
injection conditions results in a chromatogram as close as 
possible to the experimental one c^xp- 

Ideally, the goal is to find H such that the following system 
of PDEs has a unique solution c{t, z): 

( d,c + dtF{c) = 0, 

c(0,z) = Co{z), 

C{t,0) = Cinj{t), 
c{t,L) = Cexp{t). 

However, because in most real-world cases this system will 
not have an exact solution, it is turned into a minimization 
problem. For a given isotherm function H, solve system [T] 
and define the cost function J as the least square differ- 
ence between the computed chromatogram Cii{t, L) and the 
experimental one c^xpit): 

J(H)-/ \\cuit,L)-Cexpm^dt (5) 



(4) 



If many experimental chromatograms are provided, the cost 
function is the sum of such functions J computed for each 
experimental chromatogram. 

B. Search Space 

When tackling a function identification problem, the first 
issue to address is the parametric vs non-parametric choice 
|16|: parametric models for the target function result in 
parametric optimization problems that are generally easier 
to tackle - but a bad choice of the model can hinder the 
optimization. On the other hand, non-parametric models are a 
priori less biased, but search algorithms are also less efficient 
on large unstructured search space. 

Early trials to solve the chromatography inverse problem 
using a non-parametric model (recurrent neural-network) 
have brought a proof-of-concept to such approach [4J, but 
have also demonstrated its limits: only limited precision 
could be reached, and the approach poorly scaled up with 
the number of components of the mixture. 

Fortunately, chemists provide a whole zoology of 
parametrized models for the isotherm function H, and using 
such models, the identification problem amounts to para- 
metric optimization. For i G {!,..., m}, denote the 
component i of the function H. The main models for the 
isotherm function that will be used here are the following: 

> The Langmuir isotherm lfT4l assumes that the different 
components are in competition to occupy each site of 
the porous medium. This gives, for all i = 1, . . . ,m 



H,(c) = 



N* 



K,:C,: 



(6) 



There are m + 1 positive parameters: the Langmuir 
coefficients (Ki)jg[i m], homogeneous to the inverse of 
a concentration, and the saturation coefficient N* that 
corresponds to some limit concentration. 
The Bi-Langmuir isotherm generalizes the Langmuir 
isotherm by assuming two different kinds of sites on 
the absorbing medium. The resulting equations are, for 
all i = 1 , . . . , TO 



se{i,2} 



1 + E/" 1 K/^sC; 



K, 



(7) 



'Mean value over the volume defined by the corresponding cell of the 
grid. 



This isotherm function here depends on 2(to + 
1) parameters: the generalized Langmuir coefficients 
(Ki c) _r, , , „ and the generalized saturation coef- 
ficients (N*)5=i^2- 
• The Lattice isotherm |17| is a generalization of Lang- 
muir isotherm that also considers interactions among 
the different sites of the porous medium. Depending 
on the degree d of interactions (number of interact- 
ing sites grouped together), this model depends, addi- 
tionally to the Langmuir coefficients (Ki)jg[i ,,„] and 
the saturation coefficient N*, on interaction energies 
(.'^ij)^,j&[o,d],2<^+J<d resulting in J]" i ^ parame- 
ters. For instance, for one component (m — 1) and 



degree 2, this gives: 



Hi(c) 



N* 



Ki c 



■(Ki c) 



1 + 2Ki c + e--nT-( 



(8) 



■(Ki c)2 

where T is the absolute temperature and R is the 
universal gas constant. Note that in all cases, a Lattice 
isotherm with energies simplifies to the Langmuir 
isotherm with the same Langmuir and saturation co- 
efficients up to a factor i. 

IV. Approach Description 

A. Motivations 

Previous works on parametric optimization of the chro- 
matography inverse problem have used gradient-based ap- 
proaches lfT3l . lfT2l . In ifTSl . the gradient of J' is obtained by 
writing and solving numerically the adjoint problem, while 
direct differentiation of the discretized equation have also 
been investigated in lfT2l . However the fitness function to 
optimize is not necessarily convex and no results are provided 
for differentiability. Moreover, experiments performed in [ 12] 
suggest that the function is multimodal, since the gradient 
algorithm converges to different local optima depending 
on the starting point. Evolutionary algorithms (EAs) are 
stochastic global optimization algorithms, less prone to get 
stuck in local optima than gradient methods, and do not rely 
on convexity assumptions. Thus they seem a good choice to 
tackle this problem. Among EAs, Evolution Strategies have 
been specifically designed for continuous optimization. The 
next section introduces the state of the art EA for continuous 
optimization, the covariance matrix adaptation ES (CMA- 
ES). 

B. The CMA Evolution Strategy 

CMA-ES is a stochastic optimization algorithm specif- 
ically designed for continuous optimization |9|, [8], [7J, 
13]. At each iteration g, a population of points of an n- 
dimensional continuous search space (subset of R"), is sam- 
pled according to a multi-variate normal distribution. Evalu- 
ation of the fitness of the different points is then performed, 
and parameters of the multi-variate normal distribution are 
updated. 

More precisely, let (x)^^ denotes the mean value of the 
(normally) sampling distribution at iteration g. Its covariance 
matrix is usually factorized in two terms: cr'f^ G K+, also 
called the step-size, and C^^\ a definite positive n x n 
matrix, that is abusively called the covariance matrix. The 
independent sampling of the A offspring can then be written: 



)l'^+A4.(0,(a(f))2c(3)) forfc = l,. 



where Afk (0, AI) denote independent realizations of the 
multi-variate normal distribution of covariance matrix M. 
The /i best offspring are recombined into 



E-(s+i) 



(9) 



where the positive weights e M are set according to 
individual ranks and sum to one. The index i : A denotes 
the i-th best offspring. Eq.|9]can be rewritten as 



(x) 



(9) 
W 



The covariance matrix C^^-* is a positive definite symmetric 
matrix. Therefore it can be decomposed in 

C(9) = _B(9)_D(9)d(9) (^B^9)Y ^ 

where B^^^ is an orthogonal matrix, i.e. B'^^^ (S^^))"^ = 
Id and £)(9) ^ diagonal matrix whose diagonal contains the 
square root of the eigenvalues of C^^^ 

The so-called strategy parameters of the algorithm, the 
covariance matrix C'^' and the step-size cr'^^ are updated 
so as to increase the probability to reproduce good steps. 
The so-called rank-one update for C'^-' [9| takes place as 
follows. First, an evolution path is computed: 

where Cc g]0, 1] is the cumulation coefficient and ^cft is a 
strictly positive coefficient. This evolution path can be seen 
as the descent direction for the algorithm. 

Second the covariance matrix C'^' is "elongated" in the 
direction of the evolution path, i.e. the rank-one matrix 

Uc'^^^Y is added to C^a), 



= (1 - Ceov)C(3) 



f-covi-'c 



where Ccov S]0,1[. The complete update rule for the co- 
variance matrix is a combination of the rank-one update 
previously described and the rank-mu update presented in 

The update rule for the step-size cr^^'' is called the path 
length control. First, another evolution path is computed: 



^9+1) = (1 _ c,)p<9) 



r(9) 



'W 



/w 



(11) 



where g]0, 1]. The length of this vector is compared to 
the length that this vector would have had under random 
selection, i.e. in a scenario where no information is gained 
from the fitness function and one is willing to keep the 
step-size constant. Under random selection the vector jf^ ^ is 
distributed as JV{0,Id). Therefore, the step-size is increased 
if the length of pi^^ is larger than E(|| Af{0,ld) ||) and 
decreased if it is shorter Formally, the update rule reads: 



^(9+1) ^ ^(9) 




^9+1) 



II) 



- 1 



(12) 



where > is a damping factor. 



The default parameters for CMA-ES were carefully de- 
rived in (T\, Eqs. 6-8. The only problem-dependent parame- 
ters are (a^)^' and cr'''^ and, to some extend, the offspring 
size A: its default value is [4 + 31og(ri)J (the /i default value 
is [-IJ), but increasing A increases the probability to converge 
towards the global optimum when minimizing multimodal 
fitness functions |7|. 

This fact was systematically exploited in |3|, where a 
"CMA-ES restart" algorithm is proposed, in which the pop- 
ulation size is increased after each restart. Different restart 
criteria are used: 

1) RestartTolFun: Stop if the range of the best objective 
function values of the recent generation is below than 
a TolFun value. 

2) RestartTolX: Stop if the standard deviation of the 
normal distribution is smaller than a TolX value and 
apc is smaller than TolX in all components. 

3) RestartOnNoEjfectAxis: Stop if adding a 0.1 standard 
deviation vector in a principal axis direction of C'^^ 
does not change (x)^ ■ 

4) RestartCondCov. Stop if the condition number of the 
covariance matrix exceeds a fixed value. 

The resulting algorithm (the CMA-ES restart, simply denoted 
CMA-ES in the remainder of this paper) is a quasi parameter 
free algorithm that performed best for the CEC 2005 special 
session on parametric optimization |[T1. 

An important property of CMA-ES is its invariance to 
linear transformations of the search space. Moreover, because 
of the rank-based selection, CMA-ES is invariant to any 
monotonous transformation of the fitness function: optimiz- 
ing f 01 ho f is equivalent, for any rank-preserving function 
ft, : R ^ R. In particular, convexity has no impact on the 
actual behavior of CMA-ES. 

C. CMA-ES Implementation 

This section describes the specific implementation of 
CMA-ES to identify n isotherm coefficients. For the sake 
of clarity we will use a single index in the definition of the 
coefficients of the isotherm, i.e we will identify K^, 
and Ec for a G [1, A], b € [1, B] and c e [1, C] where A, B 
and C are integers summing up to n. 

Fitness function and CFL condition: The goal is to 
minimize the fitness function defined in Section ITlI-AI In the 
case where identification is done using only one experimental 
chromatogram, the fitness function is the function J defined 
in Eq.|5]as the least squared difference between an exper- 
imental chromatogram Cexp{t) obtained using experimental 
conditions Cq, Cmj and a numerical approximation of the 
solution of system ([U for a candidate isotherm function 
H using the same experimental conditions. The numerical 
simulation of a solution of Eq.[T]is computed with a Godunov 
scheme written in C++ (see ifTSl for the details of the 
implementation) . 

In order to validate the CMA-ES approach, first "ex- 
perimental" chromatograms were in fact computed using 



numerical simulations of Eq.[T]with different experimental 
conditions. Let Fs,;„i denotes the flux function used to 
simulate the experimental chromatogram. For the simulation 
of an approximated solution of Eq.[T] a time step At and a 
CFL coefficient strictly smaller than one (typically 0.8) are 
fixed beforehand. The quantity max Sp(|F'j,j,„(c)|) is then 
estimated using a power method, and the space step Az can 
then be set such that Eq.[3]is satisfied for 'Fsim- The same 
At and Az are then used during the optimization of J. 

When Cexp comes from real data, an initial value for the 
parameters to estimate, i.e. an initial guess given by the 
expert is used to set the CFL condition 

Using expert knowledge: The choice of the type of 
isotherm function to be identified will be, in most cases, 
given by the chemists. Fig [T] illustrates the importance of 
this choice. In Fig [T}(a)^ the target chromatogram c^xp is 
computed using a Langmuir isotherm with one component 
(to — 1 and thus n = 2). In Fig [I]-(b), the target chro- 
matogram Cexp is computed using a Lattice of degree 3 with 
one component (m = 1 and thus n — 4). In both cases, the 
identification is done using a Langmuir model, with n — 2. 
It is clear from the figure that one is able to correctly identify 
the isotherm, and hence fit the "experimental" chromatogram 
when choosing the correct model (Fig [T] (a)) whereas the fit 
of the chromatogram is very poor when the model is not 
correct (Fig[T](b)). 

Another important issue when using CMA-ES is the initial 
choice for the covariance matrix: without any information, 
the algorithm starts with the identity matrix. However, this 
is a poor choice in case the different variables have very 
different possible order of magnitude, and the algorithm will 
spend some time adjusting its principal directions to those 
ranges. 

In most cases of chromatographic identification, however, 
chemists provide orders of magnitudes, bounds and 
initial guesses for the different values of the unknown 
parameters. Let [(Ka)mm,( 

and [(Ec)min, i^c)max\ the ranges guessed by the chemists 
for respectively each Ka, and Ec. All parameters are 
linearly scaled into those intervals from [—1,1], removing 
the need to modify the initial covariance matrix of CMA-ES. 

Unfeasible solutions: Two different situations can lead 
to unfeasible solutions: 

First when one parameter at least, among parameters 
which have to be positive, becomes negative (remember that 
CMA-ES generates offspring using an unbounded normal 
distribution), the fitness function is arbitrarily set to 10^°. 

Second when the CFL condition is violated, the simulation 
is numerically unstable, and generates absurd values. In this 
case, the simulation is stopped, and the fitness function is 
arbitrarily set to a value larger than 10^. Note that a better 
solution would be to detect such violation before running 
the simulation, and to penalize the fitness by some amount 
that would be proportional to the actual violation. But it is 
numerically intractable to predict in advance if the CFL is 



(a) Simulation using a Lang- (b) Simulation using a Lattice 

muir isotherm, identification us- isotherm, identification using a 

ing a Langmuir model: the chro- Langmuir model: poor fit of the 

matogram is perfectly fit. chromatogram. 

Fig. 1 

Importance of the choice of model (one component mixture) 



going to be violated (see Eq.|3]l, and the numerical absurd 
values returned in case of numerical instability are not 
clearly correlated with the amount of violation either. 

Initialization: The initial mean (x)^' for CMA-ES 
is uniformly drawn in [—1,1]", i.e., the parameters Ka, 

and Ec are uniformly drawn in the ranges given by 
the expert. The initial step-size (Tq is set to 0.3. Besides 
we reject individuals of the population sampled outside 
the initial ranges. Unfeasible individuals are also rejected 
at initialization: at least one individual should be feasible 
to avoid random behavior of the algorithm. In both cases, 
rejection is done by resampling until a "good" individual 
is got or a maximal number of sampling individuals is 
reached. Initial numbers of offspring A and parents /i are 
set to the default values (A = [4 + 3 log(n)J and fi = L'^/2J)- 

Restarting and stopping criteria: The algorithm stops 
if it reaches 5 restarts, or a given fitness value (typically a 
value between 10^^ and 10^^'' for artificial problems, and 
adjusted for real data). Restart criteria (see Section ITV-Bb are 
RestartTolFun with TolFun= 10" x a^°\ RestaitToLX with 
ToLX= 10~i2 X (t("), RestartOnNoEffectAxis and Restart- 
CondCov with a limit upper bound of lO^'' for the condition 
number The offspring size A is doubled after each restart 
and fi is set equal to [A/2 J. 

V. Results 
A. Validation using artificial data 

A first series of validation runs was carried out using sim- 
ulated chromatograms. Each identification uses one or many 
experimental chromatograms. Because the same discretiza- 
tion is used for both the identification and the generation of 
the "experimental" data, one solution is known (the same 
isotherm that was used to generate the data), and the best 
possible fitness is thus zero. 

Several tests were run using different models for the 
isotherm, different number of components, and different 
numbers of time steps. In all cases, CMA-ES identified the 
correct parameters, i.e. the fitness function reaches values 
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f_recert^2.449E-15 

Fig. 2 

Single component mixture, 1000 time steps. Simulate a 
Lattice (5 parameters) and identify a Lattice of degree 4 (5 
parameters): best fitness versus number of evaluations. the 
first run gave a satisfactory solution but two restarts have 
been performed to reach a fitness value (2.4 10"^^) lower 

THAN 10"^'*. 




Fig. 3 

binary component mixture, 500 time steps . simulate a 
Langmuir (3 parameters) and identify a Lattice of degree 3 
(10 parameters): Best fitness versus number of evaluations. 
The first run gave a satisfactory solution but the maximal 

number (here five) of restarts have been performed 
attempting to reach a fitness value of lo"'^*, the best fitness 
value (1.4 lo"'^*) was reached in the fourth restart. 



very close to zero. In most cases, CMA-ES did not need 
any restart to reach a precision of (lO"^''), though this was 
necessary in a few cases. This happened when the whole 
population remained unfeasible during several generations, 
or when the algorithm was stuck in a local optimum. Fig- 
ures |2] S m show typical evolutions during one run of the 
best fitness value with respect to the number of evaluations, 
for problems involving respectively 1, 2 or 3 components. 
Figure |4] is a case where restarting allowed the algorithm to 
escape a local optimum. 

Specific tests were then run in order to study the influence 
of the expert guesses about both the ranges of the variables 
and the starting point of the algorithm possibly given by 
the chemical engineers: In CMA-ES, in a generation g, 
offspring are drawn from a Gaussian distribution centered 
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Fig. 4 

Ternary component mixture, 2000 time steps. Simulate a 

LANGMUIR (4 PARAMETERS) AND IDENTIFY A LANGMUIR (4 
PARAMETERS): BEST FITNESS VERSUS NUMBER OF EVALUATIONS. TWO 
RESTARTS WERE NECESSARY: BEFORE THE SECOND RESTART, CMA-ES 

IS STUCK IN SOME LOCAL OPTIMA (FITNESS OF ORDER OF 10"^), IN 
THE SECOND RESTART, THE ALGORITHM REACHES A FITNESS VALUE OF 

9.9 10-15. 



on the mean {x)\^' . An expert guess for a good solution 
can hence be input as the mean of the first distribution 
(a^)^'' that will be used to generate the offspring of the 
first generation. The results are presented in Table U First 
3 lines give the probabilities that a given run converges (i.e., 
reaches a fitness value of 10~^^), computed on 120 runs, 
and depending on the number of restarts (this probability 
of course increases with the number of restarts). The last 
line is the ratio between the average number of evaluations 
that were needed before convergence (averaged over the runs 
that did converge), and the probability of convergence: this 
ratio measures the performance of the different experimental 
settings, as discussed in details in |2|. 

The results displayed in Table U clearly demonstrate that 
a good guess of the range of the variables is the most 
prominent factor of success: even without any hint about 
the starting point, all runs did reach the required precision 
without any restart. However, when no indication about the 
range is available, a good initial guess significantly improves 
the results, without reaching the perfect quality brought by 
tight bounds on the ranges: scaling is more important than 
rejecting unfeasible individuals at the beginning. 

Computational cost: The duration of an evaluation 
depends on the discretization of the numerical scheme (num- 
ber of space- and time-steps), and on the number n of 
unknown parameters to identify. Several runs were precisely 
timed to assess the dependency of the computational cost 
on both factors. The simple Langmuir isotherm was used 
to both generate the data and identify the isotherm. Only 
computational costs of single evaluations are reported, as the 
number of evaluations per identification heavily depends on 
many parameters, including the possible expert guesses, and 
in any case is a random variable of unknown distribution. 
All runs in this paper were performed on a 1.8GHz Pentium 



TABLE I 

On the USEFULNESS OF EXPERT KNOWLEDGE: TARGET VALUES FOR 
LANGMUIR ISOTHERM ARE HERE (Ki,N*) = (0.0388, 107). EXPERT 
RANGE IS [0.01,0.05] X [50, 150], WIDE RANGE IS [0.001,1] X [50, 150]. 
The EXPERT GUESS FOR THE STARTING POINT IS A BETTER INITIAL 
MEAN (ACCORDING TO FITNESS VALUE) THAN RANDOM. THE FIRST 3 
LINES GIVE THE PROBABILITIES (COMPUTED OVER 120 RUNS) TO 
REACH A 10-1^ FITNESS VALUE WITHIN THE GIVEN NUMBER OF 
RESTARTS. THE LAST LINE IS THE RATIO OF THE NUMBER OF 
EVALUATIONS NEEDED FOR CONVERGENCE (AVERAGED OVER THE RUNS 
THAT DID CONVERGE) BY THE PROBABILITY OF CONVERGENCE AFTER 
TWO RESTARTS (LINE 3). 



Range 


Expert range 


Wide range 


Wide range 


Starting point 


No guess 


No guess 


Expert guess 


no restart 


1 


0.84 


0.95 


1 restart 


1 


0.92 


0.97 


2 restarts 


1 


0.95 


0.97 


Peif. 


601 


1015 


905 



computer running with a recent Linux system. 

For one component (m — 1, n — 2), and 100, 500 and 
1000 time steps, the averages of the durations of a single 
evaluation are respectively 0.0097, 0.22, and 0.9 seconds, 
fitting the theoretical quadratic increase with the number of 
time steps (though 3 sample points are too few to demonstrate 
anything!). This also holds for the number of space steps as 
the number of space steps is proportional to the number of 
time steps due to the CFL condition. For an identification 
with a 1 -component Langmuir isotherm, the total cost of the 
identification is on average 540 seconds for a 1000 time steps 
discretization. 

When looking at the dependency of the computational 
cost on the number of unknown parameters, things are not 
that clear from a theoretical point of view, because the cost 
of each computation of the isotherm function also depends 
on the number of components and on the number of ex- 
perimental chromatograms to compare with. Experimentally, 
for, 2, 3 and 4 variables, the costs of a single evaluation 
are respectively 0.9, 1.04, and 2.2 seconds (for a 1000 time 
steps discretization). For an identification, the total time is 
roughly 15 to 25 minutes for 2 variables, 40 to 60 minutes 
for 3 variables, and 1 to 2 hours for 4 variables. 

B. Experiments on real data 

The CMA-ES based approach has also been tested on 
a set of data taken from |10|. The mixture was composed 
of 3 chemical species: the benzylalcohol (BA), the 2- 
phenylethanol (PE) and the 2-methylbenzylalcohol (MBA). 
Two real experiments have been performed with different 
proportions of injected mixtures, with respective proportions 
(1,3,1) and (3,1,0). Consequently, two real chromatograms 
have been provided. For this identification, Quinones et a. I. 
ifTOl have used a modified Langmuir isotherm model in which 



TABLE n 

Comparing CMA-ES and gradient: the 3-parameters case. 
Solution ( line 1) and associated fitness values ( line 2) for 

THE modified LANGMUIR MODEL (EQ.QIJ. LINE 3: FOR CMA-ES, 
"MEDIAN (MINIMAL)" NUMBER OF FITNESS EVALUATIONS (OUT OF 12 
RUNS) NEEDED TO REACH THE CORRESONDING FITNESS VALUE ON LINE 
2. FOR GRADIENT, "NUMBER OF FITNESS EVALUATIONS - NUMBER OF 
GRADIENT EVALUATIONS" FOR THE BEST OF THE 10 RUNS DESCRIBED 
IN (12). 

CMA-ES Gradient 



A:PE=L3:1, VL=LOml 



BA:PE=3:L VL=0.5iiil 



N* 

Fitness X 10^ 
# Fit evals. 



(120.951,135.319,165.593) (123.373,135.704,159.637) 

8.96 8.78 8.96 

175 (70) 280 (203) 140 - 21 



each species has a different saturation coefficient N*: 
N* 



H,(c) 



K, 



1, 



(13) 



Six parameters are to be identified: N* and K^, for i — 
1, . . . , 3. A change of variable has been made for those tests 
SO that the unknown parameters are in fact N* and K^, where 
Kj = Ki N^: those are the values that chemical engineers 
are able to experimentally measure. 

Two series of numerical tests have been performed using 
a gradient-based method lfT2l : identification of the whole 
set of 6 parameters, and identification of the 3 saturation 
coefficients N* only, after setting the Langmuir coefficients 
to the experimentally measured values (Kj^jKjjKg) ~ 
(1.833,3.108,3.511). The initial ranges used for CMA- 
ES are [60,250] x [60,250] x [60,250] (resp. [1.5,2.5] x 
[2.7,3.7] X [3,4] x [90,200] x [100,200] x [100, 210]) when 
optimizing 3 parameters (resp. 6 parameters). Comparisons 
between the two experimental chromatograms and those 
resulting from CMA-ES identification for the two experi- 
ments are shown in Figure |5] for the 6-parameters case. 
The corresponding plots in the 3-parameters case are visually 
identical though the fitness value is slightly lower than in the 
6-parameters case (see Tables HI] and HIH i. But another point 
of view on the results is given by the comparison between 
the identified isotherms and the (few) experimental values 
gathered by the chemical engineers. The usual way to present 
those isotherms in chemical publications is that of Figure |6] 
the absorbed quantity H(c)i of each component i = 1,2,3 
is displayed as a function of the total amount of mixture 
(ci -I- C2 + C3), for five different compositions of the mixture 
lfT2l . Identified (resp. experimental) isotherms are plotted in 
Figure |6] using continuous lines (resp. discrete markers), for 
the 6-parameters case. Here again the corresponding plots 
for the 3-parameters case are visually identical. 

C. Comparison with a Gradient Method 

CMA-ES results have then been compared with those of 
the gradient method from [12], using the same data case of 
ternary mixture taken from [101 and described in previous 
Section. Chromatograms found by CMA-ES are, according 
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Fig. 5 

Experimental chromatograms (markers) and identified 
chromatograms (continuous line) for the ba, be and mba 
species. Plots on the left/right correspond to an injection 

WITH PROPORTIONS (1,3, l)/(3, 1,0). 



TABLE III 

Comparing CMA-ES and gradient: the 6-parameters case. 
Solutions ( lines 1 and 2) and associated fitness values ( line 
3) for the modified langmuir model (eo.fth. 

CMA-ES Gradient 



Fitness X lO'' 



(1.861,3.120,3.563) 
(118.732,134.860,162.498) 
8.32 



(1.780,3.009,3.470) 
(129.986,141.07,168.495) 
10.7 



to the fitness (see Tables HJandllllll. closer to the experimental 
ones than those obtained with the gradient method. Moreover, 
contrary to the gradient algorithm, all 12 independent runs of 
CMA-ES converged to the same point. Thus, no variance is 
to be reported on Tables Ull and Hill Furthermore, there seems 
to be no need, when using CMA-ES, to fix the 3 Langmuir 
coefficients in order to find good results: when optimizing 
all 6 parameters, the gradient approach could not reach a 
value smaller than 0.01, whereas the best fitness found by 
CMA-ES in the same context is 8.32 10"^ (Table HiH). 

Finally, when comparing the identified isotherms to the 
experimental ones (figure |6]l, the fit is clearly not very 
satisfying (similar deceptive results were obtained with the 
gradient method in [12]): Fitting both the isotherms and the 
chromatograms seem to be contradictory objectives. Two 
directions can lead to some improvements in this respect: 
modify the cost function J' in order to take into account 
some least-square error on the isotherm as well as on the 
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Isotherms associated to parameters values of TableHiiI 
(continuous line) and experimental ones (markers) versus 
total amount of the mixture for different proportions of the 
component in the injected concentration h2i . 



chroma tograms; or use a multi-objective approach. Both 
modifications are easy to implement using Evolutionary 
Algorithms (a multi-objective version of CMA-ES was 
recently proposed ifTTl ). while there are beyond what 
gradient-based methods can tackle. However, it might also 
be a sign that the modified Langmuir model that has been 
suggested for the isotherm function is not the correct one. 

Comparison of convergence speeds: Tables HIl and Hill 
also give an idea of the respective computational costs of 
both methods on the same real data. For the best run out of 
10, the gradient algorithm reached its best fitness value after 
21 iterations, requiring on average 7 evaluations per iteration 
for the embedded line search. Moreover, the computation 
of the gradient itself is costly - roughly estimated to 4 
times that of the fitness function. Hence, the total cost of 
the gradient algorithm can be considered to be larger than 
220 fitness evaluations. To reach the same fitness value 
(8.9610^3), CMA-ES only needed 175 fitness evaluations 
(median value out of 12 runs). To converge to its best 
value (8.78 10"^^ ^est run out of 12) CMA-ES needed 280 
fitness evaluations. Those results show that the best run of 
the gradient algorithms needs roughly the same amount of 
functions evaluations than CMA-ES to converge. Regarding 
the robustness issue, note that CMA-ES always reached the 
same fitness value, while the 10 different runs of the gradient 
algorithm from 10 different starting points gave 10 different 
solutions: in order to assess the quality of the solution, more 
runs are needed for the gradient method than for CMA-ES! 



VI. Conclusions 

This paper has introduced the use of CMA-ES for the 
parametric identification of isotherm functions in chromatog- 
raphy. Validation tests on simulated data were useful to 
adjust the (few) CMA-ES parameters, but also demonstrated 
the importance of expert knowledge: choice of the type of 
isotherm, ranges for the different parameters, and possibly 
some initial guess of a not-so-bad solution. 

The proposed approach was also applied on real data and 
compared to previous work using gradient methods. On this 
data set, the best fitness found by CMA-ES is better than 
that found by the gradient approach. Moreover, the results 
obtained with CMA-ES are far more robust: (1) CMA- 
ES always converges to the same values of the isotherm 
parameters, independently of its starting point; (2) CMA-ES 
can handle the full problem that the gradient method failed 
to efficiently solve: there is no need when using CMA-ES to 
use experimental values of the Langmuir parameters in order 
to obtain a satisfactory fitness value. Note that the fitness 
function only takes into account the fit of the chromatograms, 
resulting in a poor fit on the isotherms. The results confirm 
the ones obtained with a gradient approach, and suggest to 
either incorporate some measure of isotherm fit in the fitness, 
or to try some multi-objective method - probably the best 
way to go, as both objectives (chromatogram and isotherm 
fits) seem somehow contradictory. 
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